-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add StarCoder/SantaCoder example #146
Conversation
Hi, thanks for your work on this. I'm also interested in getting this model working. Were you able to get any code completions from this model in its current state? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff!
Looking forward to the MQA improvement
Yes @mparrett, please check the collapsible sections. For example using the prompt
|
I am having trouble running the example using the GGML file that I found here https://huggingface.co/nouamanetazi/starcoder-ggml/tree/main
|
How much RAM do you have @danforbes? Can you try with |
Ah, yes...it may indeed be a RAM thing. Is there a GGML format of that model floating around somewhere? Quick search on 🤗 didn't turn on up... |
Also, another question while I have you 😅 Can you help me understand how the StarCoder example is different from the GPT-2 example? It seems the files are identical except for the way the memory is prepared for the weights? |
I tried uploading the q4_1 quantized model of starcoder, you can find it here: https://huggingface.co/nouamanetazi/starcoder-ggml/tree/main |
Yes, this is the one I originally tested with. I guess I will take a stab at converting SantaCoder myself and seeing how that works. |
Yes, that worked much better
|
Interesting. I checked out your PR and converted the models myself, and could only get this result, with the original and quantized versions of santacoder.
I pulled from main this morning (but didn't re-convert models) and noticed the same behavior after working through this fun error during build. TL;DR Make sure to
I also noticed if I change
I also confirmed the quantized santacoder and quantized starcoder models work this way. I can't try the original starcoder because my RAM is limited (24GB).
Sharing all of this in case it's helpful to someone. Thanks again for your work! |
Can you please share that model on huggingface? |
Here you go https://huggingface.co/danforbes/santacoder-ggml-q4_1/blob/main/santacoder-ggml-q4_1.bin |
I tried to run this model but I only get:
Any idea what is going wrong? |
@danforbes do you have the new ggml model uploaded ? Can my m2 pro 16GB run the starcoder ? Or memory is not enough ? Thanks |
It should run even if it exceeds your 16GB RAM using swap memory, but it will be extremely slow (like the case for |
@NouamaneTazi Thanks |
Adds support to Starcoder and SantaCoder (aka smol StarCoder)
Quickstart:
Performance for Santacoder on M1 Pro:
Performance for StarCoder on M1 Pro:
Pretty slow as it requires 30GB of RAM whilst my laptop only has 16GB (memory requirement could still be optimized by using MQA instead of MHA, could be done in a follow-up PR)
Performance for StarCoder on DGX (device with plenty of CPU RAM)
Result of quantizing the models:
bigcode/gpt_bigcode-santacoder
bigcode/starcoder
Next TODOs:
endoftext
token for santacoder modelcc @ggerganov